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ABSTRACT 

Noise is present in the wide variety of signals obtained from sleep patients. This noise comes from a number 
of sources, from presence of extraneous signals to adjustments in signal amplification and shot noise in the 
circuits used for data collection. The noise needs to be removed in order to maximize the information gained 
about the patient using both manual and automatic analysis of the signals. Here we evaluate a number of new 
techniques for removal of that noise, and the associated problem of separating the original signal sources. 
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1. INTRODUCTION 

Electroencephalograph (EEG) and electrooculograph (EOG) measurement techniques provide valuable infor- 
mation on sleep disorders. ' Recent studies looking at memory and learning during sleep have used these 
techniques as predictors of waking performance. 1, 3,4 Comparison of thoracic and abdominal movements asso- 
ciated with breathing can reveal important information about breathing disorders and events such as apneas 
and hypopneas during sleep. 5-7 

The process by which the EEG and EOG signals are recorded is described by Telpan, 8 but a brief summary 
is given here. The EEG and EOG signals are recorded by placing electrodes on the patient's scalp. These 
detect electric potentials generated by the flow of ions in neural cells that set up electric dipoles between the 
body of the neuron (soma) and the neural branches (apical dendrites). For the data we used, these signals 
were amplified, then digitized at 250 Hz for the EEG and 50 Hz for the EOG signal. We estimate the mutual 
information between the second EEG channel, from the left anterior position El to just below the opposite 
car, with the left EOG channel, from just to the side of the left eye to the position just above the nose between 
the eyes. The signals are broken down into (typically) 30 second long epochs, these are then classified by a 
human operator into various stages of sleep and wakefulness. 

The main problems with analyzing the EEG and EOG signal are: 

• Notch filtering of the the 50 Hz interference ripple from the signals also removes useful information. 

• Skin conductances can vary over time in different ways in different locations (however the gel used helps 
prevent this problem). 

• Due to conductances across the skin, the signal received by an electrode is a mixture of the true signals 
one is trying to measure. 
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The most significant problem is the mixing of signals, this can be reduced using blind signal separation 
techniques using higher order statistics. 9, 10 The noise can then be removed using wavelet transforms. 11-13 
We also consider these techniques for the thoracic and abdominal movements, for which similar problems may 
arise. 7 

Time and power spectra plots for one of the EOG and EEG sets of data is shown in Figure 1. Those for 
one of the thoracic and abdominal sets of data are shown in Figure 2. 
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(a) The time and power spectrum plots for 
the first eight seconds of the EEG data. Note 
there are many higher frequency signals super- 
imposed on lower frequency signals, giving the 
appearance of noise, however this is important 
signal information that needs to be preserved. 
Note that some of the EOG signal is present 
on this signal. 



(b) The time and power spectrum plots for the 
first eight seconds of the EOG data. Note the 
lack of high frequency signals present in the 
EEG signal, since we are concerned here with 
low frequency muscle movement signals. The 
sampling rate used was correspondingly lower 
(50 Hz as opposed to 250 Hz for the EEG). 



Figure 1. The time and power spectra plots for the first eight seconds of the EEG and EOG data. Note the spectral 
differences between the two, with the EEG having many higher frequency components. 



2. METHODS 

There are several problems to solve in eliminating noise from the signals. For the problem of the EOG and 
EEG signals, we must first make ensure the EOG and EEG signals have the same number of data points. To 
do this we use a Gaussian smoothing procedure, that is detailed in subsection 2.1. Other related smoothing 
procedures could be used, here we assume the distribution of the signal data is Gaussian, this is reasonably 
true for the data we use. We then have the problem of separating the sources from the observed signals, which 
contain a mixture of both. Three algorithms for this, detailed in subsections 2.4 to 2.6. The noise is removed 
from both the sources using wavelet transforms as elaborated on in subsection 2.7. A flowchart of the process 
is shown in Figure 3. 

To evaluate the performance of the blind signal separation used to separate the source data, and to evaluate 
the noise removal we use an efficient algorithm, given in subsection 2.8 to estimate the mutual information 
between two signals. We expect this measure to decrease when comparing the original signals with the 
separated signals, assuming greater independence between the separated signals, and to remain the same when 
comparing the noisy signals with those where the noise has been removed, assuming the noise is uncorrelated 




Figure 2. Time and power spectra plots for the first eight seconds of the thoracic and abdominal breathing data. The 
sampling rate was 25 Hz, allowing the capture of relatively slow breathing signals. Note the thoracic signal has a slight 
phase lead over the abdominal breathing signal. 

between the two signals. This is largely true for the signals of interest, although there may be some information 
at certain frequencies that is correlated due to extraneous electromagnetic signals being received by the leads, 
as they act as antennas. This is kept to a minimum through appropriate grounding. 

2.1. Gaussian smoothing 

The EEG signal has a sampling rate five times higher than the EOG (250 Hz to 50 Hz). These are recorded 
simultaneously, so every fifth time point in the EEG corresponds to a time point of the EOG signal. We use 
Gaussian smoothing to reduce the number of data points in the EEG by a factor of five: 
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where the weights w (xj,Xi) are 

w(x j ,x i ) = e-( Xt - x ' ) * / ( 2 *'), (2) 

i is the discrete time point we are calculating the smoothed average for, and a 2 is the estimate of variance 
for the entire set of samples in the EEG signal. Now that the two sets of data, which can be written as 
x (t) = [xi (t) , x 2 (t)] , we can apply blind signal separation, using the following model of the data. 

2.2. Data model for blind signal separation 

We write the original, m-dimensional source data as s (t) = [s\ (t) , s 2 (t, ) , . . . , s m (t)] T . It is assumed that the 
sources are independent. We then consider an unknown linear model A„ xm generating the observed signals, 
written as an n-dimensional vector x (t) = [xi (t) , x 2 (t) . . . , x n (t)] by 



x (t) = As (t) , 



(3) 
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Figure 3. Flowchart showing the general steps involved in going from the original sources to the cleaned, estimated 
sources. The optional Gaussian smoothing step is not shown. 



where A is referred to as the mixing matrix. Being able to swap columns of A, and scaling a source by a scaling 
change in a row of A means there is an ambiguity in both the permutation (of labeling) of the sources and 
the scaling of the sources respectively. With this model of the data, we can now apply blind signal separation 
techniques. 

2.3. Blind signal separation 

There are two key blind signal separation approaches that are combined to form the joint cumulant and 
correlation (JCC) algorithm in subsection 2.6. They are the second order blind identification (SOBI) algorithm, 
discussed in subsection 2.4, and the joint approximate decomposition of eigenmatrices (JADE), discussed in 
subsection 2.5. Both approaches have a common first step, in which the data is whitened using a sphering 
matrix W, which transforms the mixing matrix A into a unitary matrix U, which is a matrix for which 
UU T = I. 9 ' 10 The next step of estimating A is dependent on the choice of algorithm and detailed below. 

2.4. SOBI algorithm 

Given a hypothesis of sources with different spectra and the linear model of Equation 3, we can calculate 
time-delayed, cross-correlation matrices, 
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for E[-\ the expectation operator. The correlation matrices can then be whitened, 

R = WR(r) = UR S (t) U t , 



(6) 



Vt 7^ 0. The joint diagonalization of the set of p whitened correlation matrices {R(ri) \i = 1, . . . ,p}. 9 The 
matrix U can only be uniquely determined iff for any there exists at least one lag r k such that 

E [si (t) Si (t — t)] E [sj (t) Sj (t — t)]. 9 The mixing matrix is then estimated by A = WU. An alternative 
to the SOBI algorithm is the JADE algorithm. 

2.5. JADE algorithm 

Here we assume the linear model of Equation 3 and assume independence of sources. To each n-dimensional 
vector x is associated a quadicovariance matrix Q : M — > N defined by N = QM such that 
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where Cum(-) is defined as 

Cum (xi,Xj,x k ,xi) 



E [xiXjXk] - E [xiXj] E [x k xi] - E [xiX k ] E [xjXi] - E [xiX t ] E [xjX k ] 



(7) 
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and where Xi = Xi — E [xj], etcetera. 14 As the set offixn matrices is an n 2 -dimensional linear space, it can be 
shown that there exist n 2 real eigenvalues A r and n 2 orthonormal eigenmatrices M r satisfying QM r = A r M r . 9 
It can be proved that only n of the eigenvalues are non-zero, 15 and joint diagonalizaton of the n corresponding 
eigenmatrices, labeled M r ; gives the unitary matrix U. 9 As with the SOBI algorithm, the mixing matrix is 
estimated by A = WU. Combining the two algorithms gives us the JCC algorithm. 



2.6. JCC algorithm 

In JCC, we use both the correlation information provided by the SOBI algorithm of subsection 2.4, R (ji), and 
the cumulant quadricovariance eigcnmatriccs, M r , provided by the JADE algorithm. Joint diagonalization 
gives the unitary matrix U, which again acts to give an estimator A = WU. Using A we separate the signals 
into estimates of the original source data. We then consider removing the noise from this data, using wavelet 
techniques. 

2.7. Wavelet noise removal 

The mathematical description of the continuous wavelet transform (CWT) of / e L 2 (R) is described by 
Mallat 16 as 
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where 

*l>u,s (t) = 4=. (10) 
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is a family of orthogonal wavelets, |V>u,sl = 1> (^u,s> Vv,s') = for (u,s) ^ (u',s'), and 
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The scale of the wavelet may conceptually be considered the inverse of the frequency. 

The CWT reveals much detail about a signal, however due to the continuous nature it cannot be computed 
for real signals on a digital computer. Therefore, the discrete wavelet transform (DWT) is normally used. 
The DWT calculates the wavelet coefficients at discrete intervals of time and scale instead of at all scales. 
With the DWT, a fast version of the algorithm is possible, analogous to the fast Fourier transform. This 
version of the algorithm makes use of the fact that if scales and positions are chosen based on powers of 
two (dyadic scales and positions) the analysis is very efficient. In 1988, Mallat developed an efficient way 
to implement this algorithm, which is known as a two-channel sub-band coder. 17 For a single level of 
decomposition, this algorithm passes the signal through two complementary (high-pass and low-pass) filters 
resulting in approximations which are high-scale, low-frequency components of the signal, and details, which 
are low-scale, high-frequency components of the signal. This results in twice as many data-points so the data 
is down-sampled. For further levels of decomposition, successive approximations may be iteratively broken 
down into details and approximations as shown in Figure 4. Coefficients below a certain level are regarded as 
noise and thresholded out. Thresholding may be soft or hard. Hard thresholding is defined as 

y = x for|x|>6> . . 

y = for \x\ < 9 [ ' 

and soft thresholding as 

y = sign(x)(|x| - 9) for \x\ > 6 , . 

y = for < 6> { > 

where x is the original signal, y is the thresholded signal, and 9 is the threshold. Hard thresholding tends 
to create discontinuities at x = ±9 because any values of the signal less than the threshold are immediately 
set to zero. With soft thresholding, the thresholded values are shrunk towards zero without creating the 
discontinuities. The signal is then reconstructed without significant loss of information. Then the signal may be 
reconstructed by up-sampling, passing the approximations and details through the appropriate reconstruction 
filters and combining the results. Based on SNR measures of wavelet performance, we used Daubechies wavelets 
of order 5, with soft thresholding and a decomposition level of 5; although this is not the best for noise removal, 
we are more interested in preservation of information when going from the estimated sources to the denoised 
estimated sources. 



To evaluate the performance of the above techniques we introduce a measure of mutual information. 
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Figure 4. This figure illustrates how (a) the discrete wavelet transform decomposes a signal into details and approx- 
imations iteratively decomposing the approximations; (b) wavelet packets iteratively decompose the approximations 
and details. 



2.8. MI estimation algorithm 

The following is an outline of the method we use in calculating the mutual information between the EEG and 
EOG signals, as given in Kraskov et al.. 18 

Mutual information for two signals X and Y is defined in Equation 14 

I (X, Y) = r r n (x, y) log ^*> y) dxd Vl (14) 

where /j,, \i x and \i y are probability measures. We then take the set of points Zi — (xi, y{) for the EEG Xi and 
EOG yi, i = 1, . . . N . Then we find the fcth closest neighbor of each Zi according to the metric 

\z-z'\ =max{| a; - a; / |,|t/-y / |}. (15) 

The fcth nearest neighbor is then projected onto the x and y axes giving the distances e x (i) /2 and e y (i) /2 
respectively. The mutual information is estimated by: 

J fc (X, Y) w V (k) - l/k - {i> (n x ) + J, (n y )) + ip (N) , (16) 

where ip (•) is the digamma function given by 

4>(z) = ±]nr(z), (17) 

and 

1 N 

(•••) = ^E E [---w]- ( 18 ) 

3. RESULTS 

3.1. Blind signal separation 

We compared the blind signal separation for three algorithms, across four sets of data, two of thoracic and 
abdomen (TA) breathing data, and two of EEG and EOG (EE) data. The differences in mutual information 
between the signal data and the estimated source data are shown in Table 1. The higher the mutual informa- 
tion, the better the algorithm is for separating the original sources, given the assumptions of that algorithm. 



3.2. Wavelet denoising 

For each of the generated estimates of the sources (three blind signal separation algorithms applied to four 
pairs of signals) we estimated the mutual information between the estimates of the sources, and the denoised 
estimates of the sources. These are given in Table 2. We observe no difference in mutual information between 



Table 1. The difference in mutual information (in nats/sample) between the two estimated sources and the two signals, 
Ik (est. sources) — Ik (signals), for the two sets of thoracic-abdominal (TA) data and the two sets of EEG and EOG 
(EE) data. Nats are units of information, when a natural logarithm is used. This is computed for all three (SOBI, 
JADE, and JCC) algorithms 





TA1 


TA2 


EE1 


EE2 


SOBI 


1.0319 


0.4539 


0.4522 


0.6101 


JADE 


< 0.0001 


< 0.0001 


< 0.0001 


< 0.0001 


JCC 


0.3640 


0.4292 


0.4032 


0.5342 



Table 2. The difference in mutual information (in nats/sample) between the JADE estimated sources and the wavelet 
denoised estimated sources is computed for the two sets of thoracic-abdominal (TA) data and the two sets of EEG and 
EOG (EE) data, Ik (est. denoised sources) — Ik (est. sources). Nats are units of information, when a natural logarithm 
is used. 



TA1 


TA2 


EE1 


EE2 


< 0.0001 


< 0.0001 


< 0.0001 


0.0001 



4. DISCUSSION & CONCLUSIONS 

For our particular set of thoracic and abdominal breathing data, the SOBI algorithm works well, with an 
increase in the mutual information, probably because the sources have reasonably distinct spectra. Since the 
JCC combines information from both the SOBI and JADE algorithms by way of joint diagonalization, it 
introduces the problems associated with using the JADE algorithm for this data, namely that the sources are 
not independent. The two sources have a high level of dependence, being almost synchronous during regular 
breathing, tending to differ only for compliant chests in young children or when a breathing obstruction 
occurs. 5,7 Similarly, for the EEG and EOG data, although these are more independent, the SOBI algorithm 
performs best at separating the original sources from the observed signals. 

The wavelet denoising performs well, in that it preserves (as far as we can determine) the information present 
in the signals. Further work will consider wavelet packet and matching pursuit denoising algorithms, 17 ' 19,20 
and how these effect mutual information between two different channels. We will also consider the effect of 
swapping the denoising and blind signal separation techniques, in theory this should have little to no difference 
on the results. 
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